Asymptotic Minimaxity of False Discovery Rate Thresholding for Sparse Exponential Data
نویسندگان
چکیده
Control of the False Discovery Rate (FDR) is an important development in multiple hypothesis testing, allowing the user to limit the fraction of rejected null hypotheses which correspond to false rejections (i.e. false discoveries). The FDR principle also can be used in multiparameter estimation problems to set thresholds for separating signal from noise when the signal is sparse. Success has been proven when the noise is Gaussian; see [3]. In this paper, we consider the application of FDR thresholding to a non-Gaussian setting, in hopes of learning whether the good asymptotic properties of FDR thresholding as an estimation tool hold more broadly than just at the standard Gaussian model. We consider a vector Xi, i = 1, . . . , n, whose coordinates are independent exponential with individual means μi. The vector μ is thought to be sparse, with most coordinates 1 and a small fraction significantly larger than 1. This models a situation where most coordinates are simply ‘noise’, but a small fraction of the coordinates contain ‘signal’. We develop an estimation theory working with log(μi) as the estimand, and use the percoordinate mean-squared error in recovering log(μi) to measure risk. We consider minimax estimation over parameter spaces defined by constraints on the per-coordinate l norm of log(μi): 1 n ( ∑n i=1 log(μi)) ≤ η . Members of such spaces are vectors (μi) which are sparsely heterogeneous. We find that, for large n and small η, FDR thresholding is nearly minimax, increasingly so as η decreases. The FDR control parameter 0 < q < 1 plays an important role: when q ≤ 1 2 , the FDR estimator is nearly minimax, while choosing a fixed q > 1 2 prevents near minimaxity. These conclusions mirror those found in the Gaussian case in [3]. The techniques developed here seem applicable to a wide range of other distributional assumptions, other loss measures, and non-i.i.d. dependency structures.
منابع مشابه
Adapting to Unknown Sparsity by controlling the False Discovery Rate
We attempt to recover an n-dimensional vector observed in white noise, where n is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the lp norm for p small. We obtain a procedure which is ...
متن کاملAdapting to Unknown Sparsity by Controlling the False Discovery Rate by Felix Abramovich,2 Yoav Benjamini,2
We attempt to recover an n-dimensional vector observed in white noise, where n is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the p norm for p small. We obtain a procedure which is a...
متن کاملBayesian Multiple Testing under Sparsity for Polynomial-tailed Distributions
This paper considers Bayesian multiple testing under sparsity for polynomial-tailed distributions satisfying a monotone likelihood ratio property. Included in this class of distributions are the Student’s t, the Pareto, and many other distributions. We prove some general asymptotic optimality results under fixed and random thresholding. As examples of these general results, we establish the Bay...
متن کاملA Simple Forward Selection Procedure Based on False Discovery Rate Control
We propose the use of a new false discovery rate (FDR) controlling procedure as a model selection penalized method, and compare its performance to that of other penalized methods over a wide range of realistic settings: nonorthogonal design matrices, moderate and large pool of explanatory variables, and both sparse and nonsparse models, in the sense that they may include a small and large fract...
متن کاملThe False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data
Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004